Overview

Dataset statistics

Number of variables41
Number of observations59400
Missing cells46094
Missing cells (%)1.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory122.0 MiB
Average record size in memory2.1 KiB

Variable types

CAT29
NUM10
BOOL2

Reproduction

Analysis started2020-04-19 14:49:22.852634
Analysis finished2020-04-19 14:56:59.419130
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
date_recorded has a high cardinality: 356 distinct values High cardinality
funder has a high cardinality: 1897 distinct values High cardinality
installer has a high cardinality: 2145 distinct values High cardinality
wpt_name has a high cardinality: 37400 distinct values High cardinality
subvillage has a high cardinality: 19287 distinct values High cardinality
lga has a high cardinality: 125 distinct values High cardinality
ward has a high cardinality: 2092 distinct values High cardinality
scheme_name has a high cardinality: 2696 distinct values High cardinality
extraction_type_group is highly correlated with extraction_type and 1 other fieldsHigh Correlation
extraction_type is highly correlated with extraction_type_group and 1 other fieldsHigh Correlation
extraction_type_class is highly correlated with extraction_type and 1 other fieldsHigh Correlation
management_group is highly correlated with managementHigh Correlation
management is highly correlated with management_groupHigh Correlation
payment_type is highly correlated with paymentHigh Correlation
payment is highly correlated with payment_typeHigh Correlation
quality_group is highly correlated with water_qualityHigh Correlation
water_quality is highly correlated with quality_groupHigh Correlation
quantity_group is highly correlated with quantityHigh Correlation
quantity is highly correlated with quantity_groupHigh Correlation
source_type is highly correlated with source and 1 other fieldsHigh Correlation
source is highly correlated with source_type and 1 other fieldsHigh Correlation
source_class is highly correlated with source and 1 other fieldsHigh Correlation
waterpoint_type_group is highly correlated with waterpoint_typeHigh Correlation
waterpoint_type is highly correlated with waterpoint_type_groupHigh Correlation
quantity_group is highly correlated with quantityHigh Correlation
quantity is highly correlated with quantity_groupHigh Correlation
funder has 3635 (6.1%) missing values Missing
installer has 3655 (6.2%) missing values Missing
public_meeting has 3334 (5.6%) missing values Missing
scheme_management has 3877 (6.5%) missing values Missing
scheme_name has 28166 (47.4%) missing values Missing
permit has 3056 (5.1%) missing values Missing
amount_tsh is highly skewed (γ1 = 57.80779995) Skewed
num_private is highly skewed (γ1 = 91.93374999) Skewed
date_recorded only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
amount_tsh has 41639 (70.1%) zeros Zeros
gps_height has 20438 (34.4%) zeros Zeros
longitude has 1812 (3.1%) zeros Zeros
num_private has 58643 (98.7%) zeros Zeros
population has 21381 (36.0%) zeros Zeros
construction_year has 20709 (34.9%) zeros Zeros

Variables

id
Real number (ℝ≥0)

UNIFORM
UNIQUE
Distinct count59400
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37115.13177
Minimum0
Maximum74247
Zeros1
Zeros (%)< 0.1%
Memory size3.4 MiB

Quantile statistics

Minimum0
5-th percentile3730.9
Q118519.75
median37061.5
Q355656.5
95-th percentile70564.05
Maximum74247
Range74247
Interquartile range (IQR)37136.75

Descriptive statistics

Standard deviation21453.12837
Coefficient of variation (CV)0.5780156866
Kurtosis-1.201515029
Mean37115.13177
Median Absolute Deviation (MAD)18586.04643
Skewness0.00262253035
Sum2204638827
Variance460236716.9
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 74247.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2047 1 < 0.1%
 
72310 1 < 0.1%
 
49805 1 < 0.1%
 
51852 1 < 0.1%
 
62091 1 < 0.1%
 
64138 1 < 0.1%
 
57993 1 < 0.1%
 
60040 1 < 0.1%
 
33413 1 < 0.1%
 
35460 1 < 0.1%
 
Other values (59390) 59390 > 99.9%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
4 1 < 0.1%
 
ValueCountFrequency (%) 
74247 1 < 0.1%
 
74246 1 < 0.1%
 
74243 1 < 0.1%
 
74242 1 < 0.1%
 
74240 1 < 0.1%
 

amount_tsh
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count98
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean317.6503847
Minimum0
Maximum350000
Zeros41639
Zeros (%)70.1%
Memory size928.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile1200
Maximum350000
Range350000
Interquartile range (IQR)20

Descriptive statistics

Standard deviation2997.574558
Coefficient of variation (CV)9.436709989
Kurtosis4903.543102
Mean317.6503847
Median Absolute Deviation (MAD)522.1244629
Skewness57.80779995
Sum18868432.85
Variance8985453.232
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000e+00 1.000e-01 3.500e+00 6.500e+00 8.000e+00 ... 2.250e+04 5.500e+04 1.085e+05 1.185e+05 3.500e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 41639 70.1%
 
500 3102 5.2%
 
50 2472 4.2%
 
1000 1488 2.5%
 
20 1463 2.5%
 
200 1220 2.1%
 
100 816 1.4%
 
10 806 1.4%
 
30 743 1.3%
 
2000 704 1.2%
 
Other values (88) 4947 8.3%
 
ValueCountFrequency (%) 
0 41639 70.1%
 
0.2 3 < 0.1%
 
0.25 1 < 0.1%
 
1 3 < 0.1%
 
2 13 < 0.1%
 
ValueCountFrequency (%) 
350000 1 < 0.1%
 
250000 1 < 0.1%
 
200000 1 < 0.1%
 
170000 1 < 0.1%
 
138000 1 < 0.1%
 

date_recorded
Categorical

HIGH CARDINALITY
TYPE DATE
Distinct count356
Unique (%)0.6%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
2011-03-15
 
572
2011-03-17
 
558
2013-02-03
 
546
2011-03-14
 
520
2011-03-16
 
513
Other values (351)
56691
ValueCountFrequency (%) 
2011-03-15 572 1.0%
 
2011-03-17 558 0.9%
 
2013-02-03 546 0.9%
 
2011-03-14 520 0.9%
 
2011-03-16 513 0.9%
 
2011-03-18 497 0.8%
 
2011-03-19 466 0.8%
 
2013-02-04 464 0.8%
 
2013-01-29 459 0.8%
 
2011-03-04 458 0.8%
 
Other values (346) 54347 91.5%
 

Length

Max length10
Mean length10
Min length10
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Dash_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

funder
Categorical

HIGH CARDINALITY
MISSING
Distinct count1897
Unique (%)3.4%
Missing3635
Missing (%)6.1%
Memory size3.4 MiB
Government Of Tanzania
9084
Danida
 
3114
Hesawa
 
2202
Rwssp
 
1374
World Bank
 
1349
Other values (1892)
38642
ValueCountFrequency (%) 
Government Of Tanzania 9084 15.3%
 
Danida 3114 5.2%
 
Hesawa 2202 3.7%
 
Rwssp 1374 2.3%
 
World Bank 1349 2.3%
 
Kkkt 1287 2.2%
 
World Vision 1246 2.1%
 
Unicef 1057 1.8%
 
Tasaf 877 1.5%
 
District Council 843 1.4%
 
Other values (1887) 33332 56.1%
 
(Missing) 3635 6.1%
 

Length

Max length30
Mean length9.505824916
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 26 37.7%
 
Lowercase_Letter 26 37.7%
 
Decimal_Number 5 7.2%
 
Other_Punctuation 5 7.2%
 
Open_Punctuation 2 2.9%
 
Close_Punctuation 2 2.9%
 
Space_Separator 1 1.4%
 
Connector_Punctuation 1 1.4%
 
Dash_Punctuation 1 1.4%
 
ValueCountFrequency (%) 
Latin 52 75.4%
 
Common 17 24.6%
 
ValueCountFrequency (%) 
ASCII 69 100.0%
 

gps_height
Real number (ℝ)

ZEROS
Distinct count2428
Unique (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean668.2972391
Minimum-90
Maximum2770
Zeros20438
Zeros (%)34.4%
Memory size3.4 MiB

Quantile statistics

Minimum-90
5-th percentile0
Q10
median369
Q31319.25
95-th percentile1797
Maximum2770
Range2860
Interquartile range (IQR)1319.25

Descriptive statistics

Standard deviation693.1163503
Coefficient of variation (CV)1.037137833
Kurtosis-1.292440135
Mean668.2972391
Median Absolute Deviation (MAD)637.9529678
Skewness0.462402085
Sum39696856
Variance480410.2751
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ -90. -58. -50.5 -40.5 -28.5 ... 2180.5 2200.5 2366.5 2627.5 2770. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 20438 34.4%
 
-15 60 0.1%
 
-16 55 0.1%
 
-13 55 0.1%
 
-20 52 0.1%
 
1290 52 0.1%
 
-14 51 0.1%
 
303 51 0.1%
 
-18 49 0.1%
 
-19 47 0.1%
 
Other values (2418) 38490 64.8%
 
ValueCountFrequency (%) 
-90 1 < 0.1%
 
-63 2 < 0.1%
 
-59 1 < 0.1%
 
-57 1 < 0.1%
 
-55 1 < 0.1%
 
ValueCountFrequency (%) 
2770 1 < 0.1%
 
2628 1 < 0.1%
 
2627 1 < 0.1%
 
2626 2 < 0.1%
 
2623 1 < 0.1%
 

installer
Categorical

HIGH CARDINALITY
MISSING
Distinct count2145
Unique (%)3.8%
Missing3655
Missing (%)6.2%
Memory size3.4 MiB
DWE
17402
Government
 
1825
RWE
 
1206
Commu
 
1060
DANIDA
 
1050
Other values (2140)
33202
ValueCountFrequency (%) 
DWE 17402 29.3%
 
Government 1825 3.1%
 
RWE 1206 2.0%
 
Commu 1060 1.8%
 
DANIDA 1050 1.8%
 
KKKT 898 1.5%
 
Hesawa 840 1.4%
 
0 777 1.3%
 
TCRS 707 1.2%
 
Central government 622 1.0%
 
Other values (2135) 29358 49.4%
 
(Missing) 3655 6.2%
 

Length

Max length30
Mean length5.91976431
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 26 37.1%
 
Lowercase_Letter 26 37.1%
 
Other_Punctuation 5 7.1%
 
Decimal_Number 4 5.7%
 
Close_Punctuation 3 4.3%
 
Open_Punctuation 2 2.9%
 
Space_Separator 1 1.4%
 
Connector_Punctuation 1 1.4%
 
Dash_Punctuation 1 1.4%
 
Currency_Symbol 1 1.4%
 
ValueCountFrequency (%) 
Latin 52 74.3%
 
Common 18 25.7%
 
ValueCountFrequency (%) 
ASCII 70 100.0%
 

longitude
Real number (ℝ≥0)

ZEROS
Distinct count57516
Unique (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.07742669
Minimum0
Maximum40.34519307
Zeros1812
Zeros (%)3.1%
Memory size3.4 MiB

Quantile statistics

Minimum0
5-th percentile30.04066001
Q133.09034738
median34.90874343
Q337.17838657
95-th percentile39.13323954
Maximum40.34519307
Range40.34519307
Interquartile range (IQR)4.08803919

Descriptive statistics

Standard deviation6.567431846
Coefficient of variation (CV)0.1927208854
Kurtosis19.18703105
Mean34.07742669
Median Absolute Deviation (MAD)3.302270448
Skewness-4.191046455
Sum2024199.146
Variance43.13116105
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 14.80356095 29.60716149 29.63885954 29.68126761 ... 39.67089348 39.88985935 40.10245293 40.20239876 40.34519307], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 1812 3.1%
 
37.54090064 2 < 0.1%
 
33.01050977 2 < 0.1%
 
39.09348389 2 < 0.1%
 
32.9727187 2 < 0.1%
 
33.00627548 2 < 0.1%
 
39.10395018 2 < 0.1%
 
37.54278497 2 < 0.1%
 
36.80248988 2 < 0.1%
 
39.09837398 2 < 0.1%
 
Other values (57506) 57570 96.9%
 
ValueCountFrequency (%) 
0 1812 3.1%
 
29.6071219 1 < 0.1%
 
29.60720109 1 < 0.1%
 
29.61032056 1 < 0.1%
 
29.61096482 1 < 0.1%
 
ValueCountFrequency (%) 
40.34519307 1 < 0.1%
 
40.34430089 1 < 0.1%
 
40.32523996 1 < 0.1%
 
40.32522643 1 < 0.1%
 
40.32340181 1 < 0.1%
 

latitude
Real number (ℝ)

Distinct count57517
Unique (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-5.70603266
Minimum-11.64944018
Maximum-2e-08
Zeros0
Zeros (%)0.0%
Memory size3.4 MiB

Quantile statistics

Minimum-11.64944018
5-th percentile-10.58554992
Q1-8.540621305
median-5.02159665
Q3-3.32615564
95-th percentile-1.408872227
Maximum-2e-08
Range11.64944016
Interquartile range (IQR)5.214465665

Descriptive statistics

Standard deviation2.946019081
Coefficient of variation (CV)-0.5162990219
Kurtosis-1.057616666
Mean-5.70603266
Median Absolute Deviation (MAD)2.56776991
Skewness-0.1520365709
Sum-338938.34
Variance8.679028427
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-1.16494402e+01 -1.15676907e+01 -1.14763553e+01 -1.14412923e+01 -1.13237074e+01 ... -1.19709397e+00 -1.14437511e+00 -9.98690175e-01 -4.99232185e-01 -2.00000000e-08], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-2e-08 1812 3.1%
 
-6.98584173 2 < 0.1%
 
-3.79757861 2 < 0.1%
 
-6.98188419 2 < 0.1%
 
-7.10462503 2 < 0.1%
 
-7.05692253 2 < 0.1%
 
-7.17517443 2 < 0.1%
 
-6.99073094 2 < 0.1%
 
-6.9787555 2 < 0.1%
 
-6.99470401 2 < 0.1%
 
Other values (57507) 57570 96.9%
 
ValueCountFrequency (%) 
-11.64944018 1 < 0.1%
 
-11.64837759 1 < 0.1%
 
-11.58629656 1 < 0.1%
 
-11.56857679 1 < 0.1%
 
-11.56680457 1 < 0.1%
 
ValueCountFrequency (%) 
-2e-08 1812 3.1%
 
-0.99846435 1 < 0.1%
 
-0.998916 1 < 0.1%
 
-0.99901209 1 < 0.1%
 
-0.99911702 1 < 0.1%
 

wpt_name
Categorical

HIGH CARDINALITY
Distinct count37400
Unique (%)63.0%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
none
 
3563
Shuleni
 
1748
Zahanati
 
830
Msikitini
 
535
Kanisani
 
323
Other values (37395)
52401
ValueCountFrequency (%) 
none 3563 6.0%
 
Shuleni 1748 2.9%
 
Zahanati 830 1.4%
 
Msikitini 535 0.9%
 
Kanisani 323 0.5%
 
Bombani 271 0.5%
 
Sokoni 260 0.4%
 
Ofisini 254 0.4%
 
School 208 0.4%
 
Shule Ya Msingi 199 0.3%
 
Other values (37390) 51209 86.2%
 

Length

Max length30
Mean length10.96210438
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 26 34.7%
 
Lowercase_Letter 26 34.7%
 
Decimal_Number 10 13.3%
 
Other_Punctuation 5 6.7%
 
Open_Punctuation 2 2.7%
 
Close_Punctuation 2 2.7%
 
Space_Separator 1 1.3%
 
Modifier_Symbol 1 1.3%
 
Connector_Punctuation 1 1.3%
 
Dash_Punctuation 1 1.3%
 
ValueCountFrequency (%) 
Latin 52 69.3%
 
Common 23 30.7%
 
ValueCountFrequency (%) 
ASCII 75 100.0%
 

num_private
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count65
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4741414141
Minimum0
Maximum1776
Zeros58643
Zeros (%)98.7%
Memory size3.4 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1776
Range1776
Interquartile range (IQR)0

Descriptive statistics

Standard deviation12.23622981
Coefficient of variation (CV)25.80713147
Kurtosis11137.29521
Mean0.4741414141
Median Absolute Deviation (MAD)0.9361978097
Skewness91.93374999
Sum28164
Variance149.72532
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000e+00 5.000e-01 1.500e+00 4.500e+00 5.500e+00 ... 9.800e+01 1.065e+02 1.550e+02 7.265e+02 1.776e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 58643 98.7%
 
6 81 0.1%
 
1 73 0.1%
 
5 46 0.1%
 
8 46 0.1%
 
32 40 0.1%
 
45 36 0.1%
 
15 35 0.1%
 
39 30 0.1%
 
93 28 < 0.1%
 
Other values (55) 342 0.6%
 
ValueCountFrequency (%) 
0 58643 98.7%
 
1 73 0.1%
 
2 23 < 0.1%
 
3 27 < 0.1%
 
4 20 < 0.1%
 
ValueCountFrequency (%) 
1776 1 < 0.1%
 
1402 1 < 0.1%
 
755 1 < 0.1%
 
698 1 < 0.1%
 
672 1 < 0.1%
 

basin
Categorical

Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
Lake Victoria
10248
Pangani
8940
Rufiji
7976
Internal
7785
Lake Tanganyika
6432
Other values (4)
18019
ValueCountFrequency (%) 
Lake Victoria 10248 17.3%
 
Pangani 8940 15.1%
 
Rufiji 7976 13.4%
 
Internal 7785 13.1%
 
Lake Tanganyika 6432 10.8%
 
Wami / Ruvu 5987 10.1%
 
Lake Nyasa 5085 8.6%
 
Ruvuma / Southern Coast 4493 7.6%
 
Lake Rukwa 2454 4.1%
 

Length

Max length23
Mean length10.8923569
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 20 62.5%
 
Uppercase_Letter 10 31.2%
 
Space_Separator 1 3.1%
 
Other_Punctuation 1 3.1%
 
ValueCountFrequency (%) 
Latin 30 93.8%
 
Common 2 6.2%
 
ValueCountFrequency (%) 
ASCII 32 100.0%
 

subvillage
Categorical

HIGH CARDINALITY
Distinct count19287
Unique (%)32.7%
Missing371
Missing (%)0.6%
Memory size3.4 MiB
Madukani
 
508
Shuleni
 
506
Majengo
 
502
Kati
 
373
Mtakuja
 
262
Other values (19282)
56878
ValueCountFrequency (%) 
Madukani 508 0.9%
 
Shuleni 506 0.9%
 
Majengo 502 0.8%
 
Kati 373 0.6%
 
Mtakuja 262 0.4%
 
Sokoni 232 0.4%
 
M 187 0.3%
 
Muungano 172 0.3%
 
Mbuyuni 164 0.3%
 
Mlimani 152 0.3%
 
Other values (19277) 55971 94.2%
 
(Missing) 371 0.6%
 

Length

Max length30
Mean length7.867003367
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 26 35.6%
 
Uppercase_Letter 25 34.2%
 
Decimal_Number 10 13.7%
 
Other_Punctuation 4 5.5%
 
Open_Punctuation 2 2.7%
 
Close_Punctuation 2 2.7%
 
Space_Separator 1 1.4%
 
Modifier_Symbol 1 1.4%
 
Connector_Punctuation 1 1.4%
 
Dash_Punctuation 1 1.4%
 
ValueCountFrequency (%) 
Latin 51 69.9%
 
Common 22 30.1%
 
ValueCountFrequency (%) 
ASCII 73 100.0%
 

region
Categorical

Distinct count21
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
Iringa
 
5294
Shinyanga
 
4982
Mbeya
 
4639
Kilimanjaro
 
4379
Morogoro
 
4006
Other values (16)
36100
ValueCountFrequency (%) 
Iringa 5294 8.9%
 
Shinyanga 4982 8.4%
 
Mbeya 4639 7.8%
 
Kilimanjaro 4379 7.4%
 
Morogoro 4006 6.7%
 
Arusha 3350 5.6%
 
Kagera 3316 5.6%
 
Mwanza 3102 5.2%
 
Kigoma 2816 4.7%
 
Ruvuma 2640 4.4%
 
Other values (11) 20876 35.1%
 

Length

Max length13
Mean length6.623754209
Min length4
ValueCountFrequency (%) 
Lowercase_Letter 21 65.6%
 
Uppercase_Letter 10 31.2%
 
Space_Separator 1 3.1%
 
ValueCountFrequency (%) 
Latin 31 96.9%
 
Common 1 3.1%
 
ValueCountFrequency (%) 
ASCII 32 100.0%
 

region_code
Real number (ℝ≥0)

Distinct count27
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.29700337
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size3.4 MiB

Quantile statistics

Minimum1
5-th percentile2
Q15
median12
Q317
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)12

Descriptive statistics

Standard deviation17.58740634
Coefficient of variation (CV)1.149728866
Kurtosis10.28843341
Mean15.29700337
Median Absolute Deviation (MAD)9.486968586
Skewness3.17381811
Sum908642
Variance309.3168617
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1.5 2.5 3.5 4.5 ... 32. 50. 70. 85. 99. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
11 5300 8.9%
 
17 5011 8.4%
 
12 4639 7.8%
 
3 4379 7.4%
 
5 4040 6.8%
 
18 3324 5.6%
 
19 3047 5.1%
 
2 3024 5.1%
 
16 2816 4.7%
 
10 2640 4.4%
 
Other values (17) 21180 35.7%
 
ValueCountFrequency (%) 
1 2201 3.7%
 
2 3024 5.1%
 
3 4379 7.4%
 
4 2513 4.2%
 
5 4040 6.8%
 
ValueCountFrequency (%) 
99 423 0.7%
 
90 917 1.5%
 
80 1238 2.1%
 
60 1025 1.7%
 
40 1 < 0.1%
 

district_code
Real number (ℝ≥0)

Distinct count20
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.629747475
Minimum0
Maximum80
Zeros23
Zeros (%)< 0.1%
Memory size3.4 MiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q35
95-th percentile30
Maximum80
Range80
Interquartile range (IQR)3

Descriptive statistics

Standard deviation9.633648629
Coefficient of variation (CV)1.711204396
Kurtosis16.21428363
Mean5.629747475
Median Absolute Deviation (MAD)4.743533803
Skewness3.962045299
Sum334407
Variance92.80718592
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 48. 56.5 61. 65. 80. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 12203 20.5%
 
2 11173 18.8%
 
3 9998 16.8%
 
4 8999 15.1%
 
5 4356 7.3%
 
6 4074 6.9%
 
7 3343 5.6%
 
8 1043 1.8%
 
30 995 1.7%
 
33 874 1.5%
 
Other values (10) 2342 3.9%
 
ValueCountFrequency (%) 
0 23 < 0.1%
 
1 12203 20.5%
 
2 11173 18.8%
 
3 9998 16.8%
 
4 8999 15.1%
 
ValueCountFrequency (%) 
80 12 < 0.1%
 
67 6 < 0.1%
 
63 195 0.3%
 
62 109 0.2%
 
60 63 0.1%
 

lga
Categorical

HIGH CARDINALITY
Distinct count125
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
Njombe
 
2503
Arusha Rural
 
1252
Moshi Rural
 
1251
Bariadi
 
1177
Rungwe
 
1106
Other values (120)
52111
ValueCountFrequency (%) 
Njombe 2503 4.2%
 
Arusha Rural 1252 2.1%
 
Moshi Rural 1251 2.1%
 
Bariadi 1177 2.0%
 
Rungwe 1106 1.9%
 
Kilosa 1094 1.8%
 
Kasulu 1047 1.8%
 
Mbozi 1034 1.7%
 
Meru 1009 1.7%
 
Bagamoyo 997 1.7%
 
Other values (115) 46930 79.0%
 

Length

Max length16
Mean length7.416885522
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 24 58.5%
 
Uppercase_Letter 16 39.0%
 
Space_Separator 1 2.4%
 
ValueCountFrequency (%) 
Latin 40 97.6%
 
Common 1 2.4%
 
ValueCountFrequency (%) 
ASCII 41 100.0%
 

ward
Categorical

HIGH CARDINALITY
Distinct count2092
Unique (%)3.5%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
Igosi
 
307
Imalinyi
 
252
Siha Kati
 
232
Mdandu
 
231
Nduruma
 
217
Other values (2087)
58161
ValueCountFrequency (%) 
Igosi 307 0.5%
 
Imalinyi 252 0.4%
 
Siha Kati 232 0.4%
 
Mdandu 231 0.4%
 
Nduruma 217 0.4%
 
Kitunda 203 0.3%
 
Mishamo 203 0.3%
 
Msindo 201 0.3%
 
Chalinze 196 0.3%
 
Maji ya Chai 190 0.3%
 
Other values (2082) 57168 96.2%
 

Length

Max length23
Mean length7.505841751
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 25 46.3%
 
Lowercase_Letter 25 46.3%
 
Other_Punctuation 2 3.7%
 
Space_Separator 1 1.9%
 
Dash_Punctuation 1 1.9%
 
ValueCountFrequency (%) 
Latin 50 92.6%
 
Common 4 7.4%
 
ValueCountFrequency (%) 
ASCII 54 100.0%
 

population
Real number (ℝ≥0)

ZEROS
Distinct count1049
Unique (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179.9099832
Minimum0
Maximum30500
Zeros21381
Zeros (%)36.0%
Memory size3.4 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median25
Q3215
95-th percentile680
Maximum30500
Range30500
Interquartile range (IQR)215

Descriptive statistics

Standard deviation471.4821757
Coefficient of variation (CV)2.620655994
Kurtosis402.2801153
Mean179.9099832
Median Absolute Deviation (MAD)214.6976938
Skewness12.66071359
Sum10686653
Variance222295.442
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 5.00000e-01 1.50000e+00 4.50000e+00 5.50000e+00 ... 5.00800e+03 6.88800e+03 6.96100e+03 1.07315e+04 3.05000e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 21381 36.0%
 
1 7025 11.8%
 
200 1940 3.3%
 
150 1892 3.2%
 
250 1681 2.8%
 
300 1476 2.5%
 
100 1146 1.9%
 
50 1139 1.9%
 
500 1009 1.7%
 
350 986 1.7%
 
Other values (1039) 19725 33.2%
 
ValueCountFrequency (%) 
0 21381 36.0%
 
1 7025 11.8%
 
2 4 < 0.1%
 
3 4 < 0.1%
 
4 13 < 0.1%
 
ValueCountFrequency (%) 
30500 1 < 0.1%
 
15300 1 < 0.1%
 
11463 1 < 0.1%
 
10000 3 < 0.1%
 
9865 1 < 0.1%
 

public_meeting
Boolean

MISSING
Distinct count2
Unique (%)< 0.1%
Missing3334
Missing (%)5.6%
Memory size3.4 MiB
True
51011
False
 
5055
(Missing)
 
3334
ValueCountFrequency (%) 
True 51011 85.9%
 
False 5055 8.5%
 
(Missing) 3334 5.6%
 

recorded_by
Categorical

CONSTANT
REJECTED
Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
GeoData Consultants Ltd
59400
ValueCountFrequency (%) 
GeoData Consultants Ltd 59400 100.0%
 

Length

Max length23
Mean length23
Min length23
ValueCountFrequency (%) 
Lowercase_Letter 9 64.3%
 
Uppercase_Letter 4 28.6%
 
Space_Separator 1 7.1%
 
ValueCountFrequency (%) 
Latin 13 92.9%
 
Common 1 7.1%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

scheme_management
Categorical

MISSING
Distinct count12
Unique (%)< 0.1%
Missing3877
Missing (%)6.5%
Memory size3.4 MiB
VWC
36793
WUG
 
5206
Water authority
 
3153
WUA
 
2883
Water Board
 
2748
Other values (7)
 
4740
ValueCountFrequency (%) 
VWC 36793 61.9%
 
WUG 5206 8.8%
 
Water authority 3153 5.3%
 
WUA 2883 4.9%
 
Water Board 2748 4.6%
 
Parastatal 1680 2.8%
 
Private operator 1063 1.8%
 
Company 1061 1.8%
 
Other 766 1.3%
 
SWC 97 0.2%
 
Other values (2) 73 0.1%
 
(Missing) 3877 6.5%
 

Length

Max length16
Mean length4.537373737
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 16 55.2%
 
Uppercase_Letter 12 41.4%
 
Space_Separator 1 3.4%
 
ValueCountFrequency (%) 
Latin 28 96.6%
 
Common 1 3.4%
 
ValueCountFrequency (%) 
ASCII 29 100.0%
 

scheme_name
Categorical

HIGH CARDINALITY
MISSING
Distinct count2696
Unique (%)8.6%
Missing28166
Missing (%)47.4%
Memory size3.4 MiB
K
 
682
None
 
644
Borehole
 
546
Chalinze wate
 
405
M
 
400
Other values (2691)
28557
ValueCountFrequency (%) 
K 682 1.1%
 
None 644 1.1%
 
Borehole 546 0.9%
 
Chalinze wate 405 0.7%
 
M 400 0.7%
 
DANIDA 379 0.6%
 
Government 320 0.5%
 
Ngana water supplied scheme 270 0.5%
 
wanging'ombe water supply s 261 0.4%
 
wanging'ombe supply scheme 234 0.4%
 
Other values (2686) 27093 45.6%
 
(Missing) 28166 47.4%
 

Length

Max length46
Mean length8.94456229
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 26 38.2%
 
Uppercase_Letter 25 36.8%
 
Decimal_Number 8 11.8%
 
Other_Punctuation 4 5.9%
 
Space_Separator 1 1.5%
 
Modifier_Symbol 1 1.5%
 
Open_Punctuation 1 1.5%
 
Dash_Punctuation 1 1.5%
 
Close_Punctuation 1 1.5%
 
ValueCountFrequency (%) 
Latin 51 75.0%
 
Common 17 25.0%
 
ValueCountFrequency (%) 
ASCII 68 100.0%
 

permit
Boolean

MISSING
Distinct count2
Unique (%)< 0.1%
Missing3056
Missing (%)5.1%
Memory size3.4 MiB
True
38852
False
17492
(Missing)
 
3056
ValueCountFrequency (%) 
True 38852 65.4%
 
False 17492 29.4%
 
(Missing) 3056 5.1%
 

construction_year
Real number (ℝ≥0)

ZEROS
Distinct count55
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1300.652475
Minimum0
Maximum2013
Zeros20709
Zeros (%)34.9%
Memory size3.4 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1986
Q32004
95-th percentile2010
Maximum2013
Range2013
Interquartile range (IQR)2004

Descriptive statistics

Standard deviation951.6205473
Coefficient of variation (CV)0.7316485885
Kurtosis-1.596432369
Mean1300.652475
Median Absolute Deviation (MAD)906.9094983
Skewness-0.6349277866
Sum77258757
Variance905581.6661
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 980. 1960.5 1962.5 1963.5 ... 2007.5 2010.5 2011.5 2012.5 2013. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 20709 34.9%
 
2010 2645 4.5%
 
2008 2613 4.4%
 
2009 2533 4.3%
 
2000 2091 3.5%
 
2007 1587 2.7%
 
2006 1471 2.5%
 
2003 1286 2.2%
 
2011 1256 2.1%
 
2004 1123 1.9%
 
Other values (45) 22086 37.2%
 
ValueCountFrequency (%) 
0 20709 34.9%
 
1960 102 0.2%
 
1961 21 < 0.1%
 
1962 30 0.1%
 
1963 85 0.1%
 
ValueCountFrequency (%) 
2013 176 0.3%
 
2012 1084 1.8%
 
2011 1256 2.1%
 
2010 2645 4.5%
 
2009 2533 4.3%
 

extraction_type
Categorical

HIGH CORRELATION
Distinct count18
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
gravity
26780
nira/tanira
8154
other
6430
submersible
 
4764
swn 80
 
3670
Other values (13)
9602
ValueCountFrequency (%) 
gravity 26780 45.1%
 
nira/tanira 8154 13.7%
 
other 6430 10.8%
 
submersible 4764 8.0%
 
swn 80 3670 6.2%
 
mono 2865 4.8%
 
india mark ii 2400 4.0%
 
afridev 1770 3.0%
 
ksb 1415 2.4%
 
other - rope pump 451 0.8%
 
Other values (8) 701 1.2%
 

Length

Max length25
Mean length7.719511785
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 23 79.3%
 
Decimal_Number 3 10.3%
 
Space_Separator 1 3.4%
 
Dash_Punctuation 1 3.4%
 
Other_Punctuation 1 3.4%
 
ValueCountFrequency (%) 
Latin 23 79.3%
 
Common 6 20.7%
 
ValueCountFrequency (%) 
ASCII 29 100.0%
 

extraction_type_group
Categorical

HIGH CORRELATION
Distinct count13
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
gravity
26780
nira/tanira
8154
other
6430
submersible
6179
swn 80
 
3670
Other values (8)
8187
ValueCountFrequency (%) 
gravity 26780 45.1%
 
nira/tanira 8154 13.7%
 
other 6430 10.8%
 
submersible 6179 10.4%
 
swn 80 3670 6.2%
 
mono 2865 4.8%
 
india mark ii 2400 4.0%
 
afridev 1770 3.0%
 
rope pump 451 0.8%
 
other handpump 364 0.6%
 
Other values (3) 337 0.6%
 

Length

Max length15
Mean length7.880538721
Min length4
ValueCountFrequency (%) 
Lowercase_Letter 21 80.8%
 
Decimal_Number 2 7.7%
 
Space_Separator 1 3.8%
 
Dash_Punctuation 1 3.8%
 
Other_Punctuation 1 3.8%
 
ValueCountFrequency (%) 
Latin 21 80.8%
 
Common 5 19.2%
 
ValueCountFrequency (%) 
ASCII 26 100.0%
 

extraction_type_class
Categorical

HIGH CORRELATION
Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
gravity
26780
handpump
16456
other
6430
submersible
6179
motorpump
 
2987
Other values (2)
 
568
ValueCountFrequency (%) 
gravity 26780 45.1%
 
handpump 16456 27.7%
 
other 6430 10.8%
 
submersible 6179 10.4%
 
motorpump 2987 5.0%
 
rope pump 451 0.8%
 
wind-powered 117 0.2%
 

Length

Max length12
Mean length7.602239057
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 19 90.5%
 
Space_Separator 1 4.8%
 
Dash_Punctuation 1 4.8%
 
ValueCountFrequency (%) 
Latin 19 90.5%
 
Common 2 9.5%
 
ValueCountFrequency (%) 
ASCII 21 100.0%
 

management
Categorical

HIGH CORRELATION
Distinct count12
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
vwc
40507
wug
 
6515
water board
 
2933
wua
 
2535
private operator
 
1971
Other values (7)
 
4939
ValueCountFrequency (%) 
vwc 40507 68.2%
 
wug 6515 11.0%
 
water board 2933 4.9%
 
wua 2535 4.3%
 
private operator 1971 3.3%
 
parastatal 1768 3.0%
 
water authority 904 1.5%
 
other 844 1.4%
 
company 685 1.2%
 
unknown 561 0.9%
 
Other values (2) 177 0.3%
 

Length

Max length16
Mean length4.350639731
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 21 91.3%
 
Space_Separator 1 4.3%
 
Dash_Punctuation 1 4.3%
 
ValueCountFrequency (%) 
Latin 21 91.3%
 
Common 2 8.7%
 
ValueCountFrequency (%) 
ASCII 23 100.0%
 

management_group
Categorical

HIGH CORRELATION
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
user-group
52490
commercial
 
3638
parastatal
 
1768
other
 
943
unknown
 
561
ValueCountFrequency (%) 
user-group 52490 88.4%
 
commercial 3638 6.1%
 
parastatal 1768 3.0%
 
other 943 1.6%
 
unknown 561 0.9%
 

Length

Max length10
Mean length9.892289562
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 17 94.4%
 
Dash_Punctuation 1 5.6%
 
ValueCountFrequency (%) 
Latin 17 94.4%
 
Common 1 5.6%
 
ValueCountFrequency (%) 
ASCII 18 100.0%
 

payment
Categorical

HIGH CORRELATION
Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
never pay
25348
pay per bucket
8985
pay monthly
8300
unknown
8157
pay when scheme fails
 
3914
Other values (2)
 
4696
ValueCountFrequency (%) 
never pay 25348 42.7%
 
pay per bucket 8985 15.1%
 
pay monthly 8300 14.0%
 
unknown 8157 13.7%
 
pay when scheme fails 3914 6.6%
 
pay annually 3642 6.1%
 
other 1054 1.8%
 

Length

Max length21
Mean length10.66479798
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 20 95.2%
 
Space_Separator 1 4.8%
 
ValueCountFrequency (%) 
Latin 20 95.2%
 
Common 1 4.8%
 
ValueCountFrequency (%) 
ASCII 21 100.0%
 

payment_type
Categorical

HIGH CORRELATION
Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
never pay
25348
per bucket
8985
monthly
8300
unknown
8157
on failure
 
3914
Other values (2)
 
4696
ValueCountFrequency (%) 
never pay 25348 42.7%
 
per bucket 8985 15.1%
 
monthly 8300 14.0%
 
unknown 8157 13.7%
 
on failure 3914 6.6%
 
annually 3642 6.1%
 
other 1054 1.8%
 

Length

Max length10
Mean length8.530757576
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 19 95.0%
 
Space_Separator 1 5.0%
 
ValueCountFrequency (%) 
Latin 19 95.0%
 
Common 1 5.0%
 
ValueCountFrequency (%) 
ASCII 20 100.0%
 

water_quality
Categorical

HIGH CORRELATION
Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
soft
50818
salty
 
4856
unknown
 
1876
milky
 
804
coloured
 
490
Other values (3)
 
556
ValueCountFrequency (%) 
soft 50818 85.6%
 
salty 4856 8.2%
 
unknown 1876 3.2%
 
milky 804 1.4%
 
coloured 490 0.8%
 
salty abandoned 339 0.6%
 
fluoride 200 0.3%
 
fluoride abandoned 17 < 0.1%
 

Length

Max length18
Mean length4.303282828
Min length4
ValueCountFrequency (%) 
Lowercase_Letter 18 94.7%
 
Space_Separator 1 5.3%
 
ValueCountFrequency (%) 
Latin 18 94.7%
 
Common 1 5.3%
 
ValueCountFrequency (%) 
ASCII 19 100.0%
 

quality_group
Categorical

HIGH CORRELATION
Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
good
50818
salty
 
5195
unknown
 
1876
milky
 
804
colored
 
490
ValueCountFrequency (%) 
good 50818 85.6%
 
salty 5195 8.7%
 
unknown 1876 3.2%
 
milky 804 1.4%
 
colored 490 0.8%
 
fluoride 217 0.4%
 

Length

Max length8
Mean length4.23510101
Min length4
ValueCountFrequency (%) 
Lowercase_Letter 18 100.0%
 
ValueCountFrequency (%) 
Latin 18 100.0%
 
ValueCountFrequency (%) 
ASCII 18 100.0%
 

quantity
Categorical

HIGH CORRELATION
HIGH CORRELATION
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
enough
33186
insufficient
15129
dry
 
6246
seasonal
 
4050
unknown
 
789
ValueCountFrequency (%) 
enough 33186 55.9%
 
insufficient 15129 25.5%
 
dry 6246 10.5%
 
seasonal 4050 6.8%
 
unknown 789 1.3%
 

Length

Max length12
Mean length7.362373737
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 18 100.0%
 
ValueCountFrequency (%) 
Latin 18 100.0%
 
ValueCountFrequency (%) 
ASCII 18 100.0%
 

quantity_group
Categorical

HIGH CORRELATION
HIGH CORRELATION
Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
enough
33186
insufficient
15129
dry
 
6246
seasonal
 
4050
unknown
 
789
ValueCountFrequency (%) 
enough 33186 55.9%
 
insufficient 15129 25.5%
 
dry 6246 10.5%
 
seasonal 4050 6.8%
 
unknown 789 1.3%
 

Length

Max length12
Mean length7.362373737
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 18 100.0%
 
ValueCountFrequency (%) 
Latin 18 100.0%
 
ValueCountFrequency (%) 
ASCII 18 100.0%
 

source
Categorical

HIGH CORRELATION
Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
spring
17021
shallow well
16824
machine dbh
11075
river
9612
rainwater harvesting
 
2295
Other values (5)
 
2573
ValueCountFrequency (%) 
spring 17021 28.7%
 
shallow well 16824 28.3%
 
machine dbh 11075 18.6%
 
river 9612 16.2%
 
rainwater harvesting 2295 3.9%
 
hand dtw 874 1.5%
 
lake 765 1.3%
 
dam 656 1.1%
 
other 212 0.4%
 
unknown 66 0.1%
 

Length

Max length20
Mean length8.978804714
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 20 95.2%
 
Space_Separator 1 4.8%
 
ValueCountFrequency (%) 
Latin 20 95.2%
 
Common 1 4.8%
 
ValueCountFrequency (%) 
ASCII 21 100.0%
 

source_type
Categorical

HIGH CORRELATION
Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
spring
17021
shallow well
16824
borehole
11949
river/lake
10377
rainwater harvesting
 
2295
Other values (2)
 
934
ValueCountFrequency (%) 
spring 17021 28.7%
 
shallow well 16824 28.3%
 
borehole 11949 20.1%
 
river/lake 10377 17.5%
 
rainwater harvesting 2295 3.9%
 
dam 656 1.1%
 
other 278 0.5%
 

Length

Max length20
Mean length9.303602694
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 18 90.0%
 
Space_Separator 1 5.0%
 
Other_Punctuation 1 5.0%
 
ValueCountFrequency (%) 
Latin 18 90.0%
 
Common 2 10.0%
 
ValueCountFrequency (%) 
ASCII 20 100.0%
 

source_class
Categorical

HIGH CORRELATION
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
groundwater
45794
surface
13328
unknown
 
278
ValueCountFrequency (%) 
groundwater 45794 77.1%
 
surface 13328 22.4%
 
unknown 278 0.5%
 

Length

Max length11
Mean length10.08377104
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 14 100.0%
 
ValueCountFrequency (%) 
Latin 14 100.0%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

waterpoint_type
Categorical

HIGH CORRELATION
Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
communal standpipe
28522
hand pump
17488
other
6380
communal standpipe multiple
6103
improved spring
 
784
Other values (2)
 
123
ValueCountFrequency (%) 
communal standpipe 28522 48.0%
 
hand pump 17488 29.4%
 
other 6380 10.7%
 
communal standpipe multiple 6103 10.3%
 
improved spring 784 1.3%
 
cattle trough 116 0.2%
 
dam 7 < 0.1%
 

Length

Max length27
Mean length14.82757576
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 17 94.4%
 
Space_Separator 1 5.6%
 
ValueCountFrequency (%) 
Latin 17 94.4%
 
Common 1 5.6%
 
ValueCountFrequency (%) 
ASCII 18 100.0%
 

waterpoint_type_group
Categorical

HIGH CORRELATION
Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
communal standpipe
34625
hand pump
17488
other
 
6380
improved spring
 
784
cattle trough
 
116
ValueCountFrequency (%) 
communal standpipe 34625 58.3%
 
hand pump 17488 29.4%
 
other 6380 10.7%
 
improved spring 784 1.3%
 
cattle trough 116 0.2%
 
dam 7 < 0.1%
 

Length

Max length18
Mean length13.90287879
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 17 94.4%
 
Space_Separator 1 5.6%
 
ValueCountFrequency (%) 
Latin 17 94.4%
 
Common 1 5.6%
 
ValueCountFrequency (%) 
ASCII 18 100.0%
 

status_group
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
functional
32259
non functional
22824
functional needs repair
 
4317
ValueCountFrequency (%) 
functional 32259 54.3%
 
non functional 22824 38.4%
 
functional needs repair 4317 7.3%
 

Length

Max length23
Mean length12.48176768
Min length10
ValueCountFrequency (%) 
Lowercase_Letter 14 93.3%
 
Space_Separator 1 6.7%
 
ValueCountFrequency (%) 
Latin 14 93.3%
 
Common 1 6.7%
 
ValueCountFrequency (%) 
ASCII 15 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_groupstatus_group
0695726000.02011-03-14Roman1390Roman34.938093-9.856322none0Lake NyasaMnyusi BIringa115LudewaMundindi109TrueGeoData Consultants LtdVWCRomanFalse1999gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipefunctional
187760.02013-03-06Grumeti1399GRUMETI34.698766-2.147466Zahanati0Lake VictoriaNyamaraMara202SerengetiNatta280NaNGeoData Consultants LtdOtherNaNTrue2010gravitygravitygravitywuguser-groupnever paynever paysoftgoodinsufficientinsufficientrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipefunctional
23431025.02013-02-25Lottery Club686World vision37.460664-3.821329Kwa Mahundi0PanganiMajengoManyara214SimanjiroNgorika250TrueGeoData Consultants LtdVWCNyumba ya mungu pipe schemeTrue2009gravitygravitygravityvwcuser-grouppay per bucketper bucketsoftgoodenoughenoughdamdamsurfacecommunal standpipe multiplecommunal standpipefunctional
3677430.02013-01-28Unicef263UNICEF38.486161-11.155298Zahanati Ya Nanyumbu0Ruvuma / Southern CoastMahakamaniMtwara9063NanyumbuNanyumbu58TrueGeoData Consultants LtdVWCNaNTrue1986submersiblesubmersiblesubmersiblevwcuser-groupnever paynever paysoftgooddrydrymachine dbhboreholegroundwatercommunal standpipe multiplecommunal standpipenon functional
4197280.02011-07-13Action In A0Artisan31.130847-1.825359Shuleni0Lake VictoriaKyanyamisaKagera181KaragweNyakasimbi0TrueGeoData Consultants LtdNaNNaNTrue0gravitygravitygravityotherothernever paynever paysoftgoodseasonalseasonalrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipefunctional
5994420.02011-03-13Mkinga Distric Coun0DWE39.172796-4.765587Tajiri0PanganiMoa/MweremeTanga48MkingaMoa1TrueGeoData Consultants LtdVWCZingibaliTrue2009submersiblesubmersiblesubmersiblevwcuser-grouppay per bucketper bucketsaltysaltyenoughenoughotherotherunknowncommunal standpipe multiplecommunal standpipefunctional
6198160.02012-10-01Dwsp0DWSP33.362410-3.766365Kwa Ngomho0InternalIshinabulandiShinyanga173Shinyanga RuralSamuye0TrueGeoData Consultants LtdVWCNaNTrue0swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodenoughenoughmachine dbhboreholegroundwaterhand pumphand pumpnon functional
7545510.02012-10-09Rwssp0DWE32.620617-4.226198Tushirikiane0Lake TanganyikaNyawishi CenterShinyanga173KahamaChambo0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpwuguser-groupunknownunknownmilkymilkyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpnon functional
8539340.02012-11-03Wateraid0Water Aid32.711100-5.146712Kwa Ramadhan Musa0Lake TanganyikaImalaudukiTabora146Tabora UrbanItetemia0TrueGeoData Consultants LtdVWCNaNTrue0india mark iiindia mark iihandpumpvwcuser-groupnever paynever paysaltysaltyseasonalseasonalmachine dbhboreholegroundwaterhand pumphand pumpnon functional
9461440.02011-08-03Isingiro Ho0Artisan30.626991-1.257051Kwapeto0Lake VictoriaMkonomreKagera181KaragweKaisho0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpfunctional

Last rows

idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_groupstatus_group
59390136770.02011-08-04Rudep1715DWE31.370848-8.258160Kwa Mzee Atanas0Lake TanganyikaKitontoRukwa152Sumbawanga RuralMkowe150TrueGeoData Consultants LtdVWCNaNFalse1991swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientmachine dbhboreholegroundwaterhand pumphand pumpfunctional
59391448850.02013-08-03Government Of Tanzania540Government38.044070-4.272218Kwa0PanganiMaore KatiKilimanjaro33SameMaore210TrueGeoData Consultants LtdWater authorityHingililiTrue1967gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipenon functional
59392406070.02011-04-15Government Of Tanzania0Government33.009440-8.520888Benard Charles0Lake RukwaMbuyuni AMbeya121ChunyaMbuyuni0TrueGeoData Consultants LtdVWCNaNTrue0gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipenon functional
59393483480.02012-10-27Private0Private33.866852-4.287410Kwa Peter0InternalMasangaTabora142IgungaIgunga0FalseGeoData Consultants LtdWater authorityNaNFalse0gravitygravitygravityprivate operatorcommercialpay per bucketper bucketsoftgoodinsufficientinsufficientdamdamsurfaceotherotherfunctional
5939411164500.02011-03-09World Bank351ML appro37.634053-6.124830Chimeredya0Wami / RuvuKomstariMorogoro56MvomeroDiongoya89TrueGeoData Consultants LtdVWCNaNTrue2007submersiblesubmersiblesubmersiblevwcuser-grouppay monthlymonthlysoftgoodenoughenoughmachine dbhboreholegroundwatercommunal standpipecommunal standpipenon functional
593956073910.02013-05-03Germany Republi1210CES37.169807-3.253847Area Three Namba 270PanganiKiduruniKilimanjaro35HaiMasama Magharibi125TrueGeoData Consultants LtdWater BoardLosaa Kia water supplyTrue1999gravitygravitygravitywater boarduser-grouppay per bucketper bucketsoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipefunctional
59396272634700.02011-05-07Cefa-njombe1212Cefa35.249991-9.070629Kwa Yahona Kuvala0RufijiIgumbiloIringa114NjombeIkondo56TrueGeoData Consultants LtdVWCIkondo electrical water schTrue1996gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipefunctional
59397370570.02011-04-11NaN0NaN34.017087-8.750434Mashine0RufijiMadunguluMbeya127MbaraliChimala0TrueGeoData Consultants LtdVWCNaNFalse0swn 80swn 80handpumpvwcuser-grouppay monthlymonthlyfluoridefluorideenoughenoughmachine dbhboreholegroundwaterhand pumphand pumpfunctional
59398312820.02011-03-08Malec0Musa35.861315-6.378573Mshoro0RufijiMwinyiDodoma14ChamwinoMvumi Makulu0TrueGeoData Consultants LtdVWCNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientshallow wellshallow wellgroundwaterhand pumphand pumpfunctional
59399263480.02011-03-23World Bank191World38.104048-6.747464Kwa Mzee Lugawa0Wami / RuvuKikatanyembaMorogoro52Morogoro RuralNgerengere150TrueGeoData Consultants LtdVWCNaNTrue2002nira/taniranira/tanirahandpumpvwcuser-grouppay when scheme failson failuresaltysaltyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpfunctional